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Abstract — Distributed storage systems are mainly justified due 
to the limited amount of storage capacity and improving the 
reliability through distributing data over multiple storage nodes. 
On the other hand, it may happen the data is stored in unreliable 
nodes, while it is desired the end user to have a reliable access to 
the stored data. So, in an event that a node is damaged, to prevent 
the system reliability to regress, it is necessary to regenerate a 
new node with the same amount of stored data as the damaged 
node to retain the number of storage nodes, thereby having the 
previous reliability. 

This requires the new node to connect to some of existing nodes 
and downloads the required information, thereby occupying some 
bandwidth, called the repair bandwidth. On the other hand, it is 
more likely the cost of downloading varies across different nodes. 
This paper aims at investigating the theoretical cost-bandwidth 
tradeoff, and more importantly, it is demonstrated that any point 
on this curve can be achieved through the use of the so called 
generalized regenerating codes which is an enhancement of the 
regeneration codes introduced by Dimakis et al. in (TJ. 

I. Introduction 

Data in distributed storage systems should be stored reliably 
for a long period of time. This is due to the need for 
surviving in the case that individual failures occur, thus having 
a long-term durability. To this end, the system should have 
the possibility of self-repairing in the case that a node is 
failed or leaves the system. This requires a great deal of 
data transferring due to repairing a failure node, called repair 
bandwidth. In some cases, a great deal of repair bandwidth is 
consumed to construct a new node. 

To have a reliable data, various strategies have been pro- 
posed which basically attempt to add some redundancy bits 
to the original data and distributing the encoded data across 
distinct nodes in an effective manner. The simplest strategy 
is replication in which each node stores the original data file, 
hence, the data of one node is adequate to reconstruct the 
original data. However, this is not a wise method due to the 
need to a high storage capacity. To address this issue, in J3J, O 
instead of exploiting naive replication code, an erasure coding 
is used in which the original data file of size M is divided 
into k pieces of size M/k, and encoded into n data fragments 
to be stored in one of existing n nodes. The encoding process 
is such that having access to the stored data of k nodes is 
adequate to reconstruct the original data. In other words, a 
new node should be connected to k nodes to have an access to 
all information. As a result, for a large value of k, the storage 



capacity of each node is dramatically reduced as compared 
to the replication code, since instead of storing data size of 
M, we need to merely store a fragment of data size M/k at 
each node (4), (5). Although, the erasure code requires the 
same repair bandwidth as compared to the replication code 
and imposes a decoding complexity into the system, it makes 
a balance between the system reliability and redundancy. 

To take the advantages of both replication (simple decoding 
method) and erasure coding (low storage capacity), in [4| a 
hybrid strategy is proposed. This strategy uses a single node 
containing an exact replica of the original data file as well 
as some nodes with the structure of erasure coding. Thus, for 
generating a new data fragment, this replica is used and just a 
data of size M/k is transferred across the network. Although 
the repair bandwidth of the hybrid strategy is reduced, the 
system complexity is greatly increased, i.e., if the replica is 
failed, creating a new fragment is deferred until restoring the 
replica. This in turn, may not be feasible when there is a 
stringent delay constraint. 

This motivated Dimakis et al. in [1] to deduce an elegant 
coding strategy, dubbed regenerating codes (RC), to reduce the 
repair bandwidth without the use of replica. It is shown that for 
creating a new data fragment, the newcomer node should be 
connected to d nodes (d > k) and download f3 bits from each 
surviving nodes. Accordingly, a trade-off between storage per 
node and repair bandwidth (dj3) is identified. 

Regenerating codes and other existing methods are mo- 
tivated by the assumption that surviving nodes have equal 
download cost, and creating a new node is accomplished 
through downloading the same amount of information from 
each surviving node. However, it may happen there is a 
different cost associated with each node. Thus, in an attempt 
to replace a damaged node with a new node, one may want 
to make a balance between the download cost and the repair 
bandwidth. 

The current study aims to address the aforementioned issue 
when there are two sets of nodes, each having different 
download costs, while the nodes of each set have the same 
cost. However, the material in this paper can be readily 
extended to more general cases. Accordingly, it is assumed a 
newcomer node downloads fti and /?2 bits, respectively, from 
each surviving node of cost C\ and C2, where it is simply 
assumed C\ < C2. It will be later shown that under certain 
conditions, if fti is larger than /?2, the total download cost is 
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Fig. 1. An example of Information Flow Graph when a failure is occurred 
(it is marked by cross lines), thus a new node is initiated. 

reduced at the expense of increasing the repair bandwidth. In 
other words, the more (3i is larger than ($2, the less download 
cost is produced, while having more repair bandwidth as if 
Pi =02- Moreover, for a given fii and /3 2 , congruent to what 
is done in [1 1, a trade-off between the storage per node and 
repair-bandwidth is identified. 

The rest of paper is organized as follows: In section HU dis- 
tributed storage systems are briefly introduced and their equiv- 
alent Information Flow Graph is introduced. Accordingly, it 
is argued that network coding can approach the capacity of 
such systems. Finally, regenerating codes are motivated and 
briefly introduced. Section [Til] states the problem formulation 
and motivates the main idea, finally gives an overview of 
the approach. Sections IVIIVIII present numerical results and 
conclude the paper, respectively. 

II. Background 

A. Distributed Storage Systems and connection to Network 
Coding 

In distributed storage systems, nodes join or leave the 
network continuously, hence, the network configuration varies 
across time. Motivated by the pioneering work in [1|, this 
network can be thought as an information flow graph, a 
directed acyclic graph consisting of three types of nodes: (i) A 
single source node (S), (ii) Some intermediate nodes and (iii) 
Data collectors (DC nodes). The source node is the source 
of original data file, intermediate nodes are storage nodes and 
each data collector corresponds to a request for reconstructing 
original data file. Each storage node is represented by pairs of 
incoming and outgoing nodes connected by a directional edge 
whose capacity is the corresponding storage capacity of this 
storage node. In this work, we simply assume all storage nodes 
are of capacity a. Moreover, it is assumed edges departing the 
storage nodes and arriving to a DC node have infinite capacity. 
This reflects the fact that DC nodes have access to all stored 
data of the surviving nodes they are connected to. 

As is mentioned earlier, the corresponding information flow 
graph evolves constantly across time to reflect any changes 
happening throughout the network. This graph starts from the 



source node, indicating it is the only active node at the first 
step. Then, assuming the total number of storage nodes is 
ft, the source node divides the original data file of size M into 
k pieces, encodes these k pieces to n data fragments each to 
be stored in one of existing storage nodes through direct edges 
of infinite capacity. In the case that a storage node leaves the 
system or a failure occurs, this node is replaced by a new 
one, called the newcomer node. The newcomer connects to 
d active nodes out of n — 1 existing nodes and downloads (3 
bits from each. Accordingly, the corresponding information 
flow graph is updated through establishing d directed edges 
of capacity /?, starting from outgoing nodes affiliated to the 
selected storage nodes and terminating to the corresponding 
incoming node of the newcomer (Figfl}. In this case, the total 
information received by the newcomer node, df3, is called the 
repair bandwidth (7). Finally, the data is reconstructed at each 
DC node through connecting to any arbitrary set of k nodes 
(storage nodes), including the newcommer nodes. The edges 
connecting the selected storage nodes to the corresponding DC 
node are assumed to be of infinite capacity. 

Incorporating the graphical representation of distributed 
storage systems gives the opportunity to relate the storage 
capacity as well as repair bandwidth of the original problem 
to some characteristics of the corresponding information flow 
graph. Specifically, we are interested in an important quantity, 
call the network throughput introduced by Ahlswede et al. 
in O, which basically identifies the maximum allowable 
information flow from a source to a destination node, assuming 
each link is subject to a limited capacity. Accordingly, it 
is demonstrated that using a proper coding at intermediate 
nodes, it is possible to get the information with a throughput 
at most equal to what is promised by the so called min-cut 
theorem (6). This is achieved through using an elegant coding 
strategy, called network coding, which basically can approach 
the multicast capacity of such networks Q, 0. The notion of 
using network coding has beaten the previous belief of using 
simple routing mechanism at intermediate nodes. 

B. Regenerating Codes 

As is mentioned earlier, for erasure coding, having an 
access to the data of k storage nodes out of existing n nodes 
is adequate to reconstruct the original data file. Thus, the 
newcomer needs to connect to exactly d = k nodes and 
downloads all of stored data (a = M/k), thus (3 = a = M/k. 
So the repair bandwidth becomes the same as the size of 
data file, i.e., 7 = d/3 — M. On the other hand, Dimakis 
et al. in H] show that if a newcomer could connect to more 
than k surviving nodes and downloads a certain function of 
their stored information, a lower repair bandwidth would be 
achieved, while having the same storage capacity as compared 
to that of erasure coding. 

To this end, it is shown the task of computing the repair 
bandwidth can be translated to a multicast problem over the 
corresponding information flow graph for which an optimal 
trade-off between the storage per node, a, and the repair 
bandwidth, 7, is identified. This optimal trade-off curve in- 
cludes two extremal points corresponding to the minimum 



storage capacity per node and minimum repair bandwidth, 
respectively. Recall that any points on the trade-off curve, 
including the extremal points can be achieved by the use 
of network coding approach. The former, minimum storage 
capacity, is achieved by use of the so called Minimum Storage 
Regenerating (MSR) codes. The latter, is realized through 
using Minimum Bandwidth Regenerating (MBR) codes. Ac- 
cordingly, the corresponding storage capacity per node (a) and 
repair bandwidth (7) for MSR and MBR codes are computed 
as follows 0]: 
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where in ([T), it is assumed the total data file is of size M. 
Moreover, d denotes the number of storage nodes which a 
newcomer is connected to (d > k), and k represents the total 
number of nodes which are required to reconstruct the original 
data file. In other words, a DC node needs to connect to exactly 
k storage nodes to reconstruct the original data file. 

III. Problem formulation and the proposed 

METHOD 

MSR and MBR codes are motivated by the assumption that 
the download cost of all storage nodes are the same. However, 
we rely on a more realistic situation in which storage nodes 
are subject to different download costs and the download cost 
is of great concern. Specifically, we concentrate on the case 
that there are totally two sets of storage nodes Si and S 2 
with download costs per information bit equal to C\ and C 2 , 
respectiveljfl. Accordingly, in regenerating codes, a newcomer 
connects to d nodes, each belongs either to S\ or Si. Assuming 
d\ nodes are of cost C\ and c?2 = d — d\ nodes are of cost C 2 , 
thus the total cost for reconstructing a damaged node becomes: 



C T = (C x d x + C 2 d 2 )f3 



(2) 



where j3 is the total information downloaded from each node. 
Equation (fJJ indicates that the same amount of information is 
downloaded from each node, no matter which set it basically 
belongs to. However, an important enquiry may arise; How 
to make a balance between the repair bandwidth and the total 
cost?. In this work, we aim at addressing the aforementioned 
issue and more importantly, to establish a trade-off between 
the repair bandwidth, the storage capacity, and the total cost. 

We employ a variation of the regenerating code, dubbed 
Generalized Regenerating Code (GRC), in which the new- 
comer downloads different amount of information depending 
on the type of storage node. In the course of downloading, we 
consider there are totaly d\ nodes with download cost C\ and 
d 2 nodes (d 2 = d — d\) with download cost C 2 (C 2 > C\), 
where /3\ and /3 2 bits are downloaded from each of these 
nodes, respectively. Noting C 2 > C\, one can get a lower cost 
if /?i > h- Throughout the paper, we assume j3\ = k'/3 2 0. 

'This enables the problem can be mathematically tractable. However, one 
can readily follow the same approach for more general cases. 

2 It is worth mentioning that for some practical purposes, k' should take an 
integer value. 




Fig. 2. Q* for d\>k 



As a result, the total cost for constructing a new node in this 
strategy is as follows: 



C T = Cidift + C 2 d 2 f3 2 



(3) 



It should be noted that, as is shown in the next sections, 
k! is inversely proportional to the relative download cost, 
meaning the larger k' results in the less relative cost of GRC 
as compared to that of the regenerating codes. Then, for 
a given k' , the problem is translated to computing (3 2 (or 
equivalently f3{) for which the minimum repair bandwidth or 
minimum storage capacity per node is obtained. Accordingly, 
It is shown even more reduction in Ct is possible at the 
expense of increasing the repair bandwidth. In the next section, 
we examine two different scenarios of d\ > k and d\ < k to 
explore the problem. 

IV. Scenario A: di > k 

Consider any given finite information flow graph Q, with 
a finite set of data collectors. In (TJ, it is argued that "If the 
minimum of the min-cuts separating the source with each data 
collector is larger or equal to the data object size M, then there 
exists a linear network code defined over a sufficiently large 
finite field F (whose size depends on the graph size) such 
that all data collectors can recover the data object". In Figj2] 
the graph Q*, a portion of the corresponding Information flow 
graph Q, entailing the minimum of the min-cuts for d\>k is 
shown. So referring to this flow graph and noting the above 
argument, the following condition is necessary to reconstruct 
the original data file: 

fc-i 

min{(di/3i + d 2 (3 2 - ift), a} > M . (4) 

i=0 

Thus, using (|4]i and noting fix — k'f3 2 , and after some ma- 
nipulations, a tradeoff between a m i n (the minimum required 
storage) and (3 2 is identified as follows, 

( f fte [/(0),oo) 

Umin{dl,d 2} k\f3 2 ) = < (5) 

l^lf &e [/(*),/(*-!)), 
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where 
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Thus, fii min (the minimum required download from each 
node) can be computed as, 



ft 



/(*-!) 



2M 



■ (7) 

k(2dik' + 2d 2 - kk' + k') w 

In other words, for any a > a m i n (di, d 2 , k' , f3 2 ), the points 
(n, k, di,d 2 , a, /3i, j3 2 ) with linear network coding are achiev- 
able. 

Thus, the tradeoff curve between the storage capacity (a) 
and the repair bandwidth (7 = (3idi+f3 2 d 2 ) can be established 
through using ©, where fi\ — k'(3 2 . This curve has two 
extremal points. One corresponds to minimum storage capacity 
and the other related to the minimum repair bandwidth. We call 
the codes that achieve these points as Generalized Minimum 
Storage Regenerating (GMSR) and Generalized Minimum 
Bandwidth Regenerating (GMBR) codes, respectively. GMSR 
is identified with the following storage capacity-repair band- 
width pair, 

M(d 2 + k'd 1 ) , 



("GMSR, 7GMSR) 
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Similarly, for GMBR, we arrive at the following, 

(ogmbr, 7gmbr) = 
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It can be verified that for the special case of k' = 1 and 
d = di + g?2, equations <[8J an d © become similar to the 
resulting storage capacity-repair bandwidth pairs of MSR and 
MBR codes Q], respectively. Also, for the case of k' — )• 00, 
noting j3 2 = fii/k', one can conclude that f3 2 = 0, hence, the 
nodes with lower download cost are merely exploited through- 
out the course of downloading. Accordingly, wg^zj^[) ) 

are the corresponding storage 
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capacity-repair bandwidth pairs of the resulting GMSR and 
GMBR codes. As is expected, referring to (|T), these pairs are 
similar to that of MSR and MBR codes with d = d\. 

A. Comparison between GMSR and MSR when d\>k 

Referring to (0 and the resulting storage capacity-repair 
bandwidth of MSR as is given in CQ|, GMSR and MSR yield 
the same storage capacity per node. However, they exhibit 
different repair bandwidth. To have a basis of comparison for 
the resulting repair bandwidth of GMSR and MSR, we define 
the bandwidth ratio pMSR(fc') as follows, 

A 7GMS R (fc') (d 2 +k'd 1 )(d-k+l) 

PMSR{k ) = = ,,, , , n • (10) 

7msr d{d\k + d 2 — kk + k ) 

It can be verified that as long as d> k and k > 1, the derivation 
of ([Tol l with respect to k' is positive. As these conditions hold 
here, pMSR(fc') is an increasing function with respect to kl 



and more importantly, noting /0msr(1) = 1. thus pMSR(k') is 
greater than one for k' > 1. Thus, the repair bandwidth of 
GMSR is greater than that of MSR. Moreover, we define the 
download cost ratio f]MSR(k') to compare the download cost 
of GMSR to that of MSR, as follows, 



VMSR(k') = 



c 



Tgmsr 



(k') 



(Cirfifc' + C 2 d 2 ){d - k + 1) 
(dik' + d 2 - kk' + ^)(Cidx + C 2 d 2 ) 



(11) 

Note that ^msr(I) = 1. In order to have Ct gmsr lower than 
Ctmsr' VMSR(k') should be a decreasing function, meaning to 
have a negative derivation with respect to k 1 . As a result, taking 
derivation of ( fTTT i. one can verify that the following condition 
should be satisfied, 
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(12) 



It is worth mentioning that if the above condition holds, the 
minimum value of %isr is achieved as k' tends to infinity, i.e., 

/ . x C 1 d 1 (d-k+l) 
?7MSR( + 00) = (dl _fc + i )(Cl d 1+Catfa) - 

B. Comparison between GMBR and MBR when d\>k 

Equations (Q~|i and (O indicate that the storage per node 
is equal to the repair bandwidth for both MBR and GMBR 
codes. As a result, any findings for the corresponding repair 
bandwidths of MBR and GMBR codes, can also be considered 
for storage per node as well. In this regard, we define the repair 
bandwidth ratio puBR{k') as follows, 



PuBR{k') 



7GMBR(fc') (d 2 +k'd 1 )(2d-k+l) 



7mbr 



d(2d 1 k' + 2d 2 - kk' + k' 



(13) 



Obviously, we have /9mbr(1) = 1. Again, following the same 
approach as is done in IIV-AI one can readily verify that if the 
conditions k > 1 and k' > 1 hold, Pmbr(^') is always greater 
than one. Thus, the repair bandwidth of GMBR is greater than 
that of MBR. Accordingly, we define the download cost ratio 
as follows, 



VMBK(k') 



C T GMBR (k') 

^Tmbr 
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Note that ?/mbr(1) = 1. Taking derivation of rjMBR(k') with 
respect to k', one can verify that to have t/mbr(1) < 1, the 
following condition should be satisfied, 
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It is worth mentioning that the minimum value of %ibr 
is achieved as k' tends to infinity, i.e., t?mbr(+oo) = 

C 1 d 1 (2d-k+l) 



(2d 1 -k+l)(Cid 1 +C 2 d 2 ) ' 
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Fig. 3. Q* for d\ < k 



V. Scenario B: d% <k 

In this case, the information flow graph Q* has a minimum 
min-cut similar to what is shown in Figf3] As a result, 
according to min-cut theorem as is addressed in Section Jy] 
the following condition should be satisfied, 



^min{(di/3i + d 2 /?2 - i/3i), ct}+ 



i=0 



(16) 



The above condition introduces a tradeoff between a and 
ft 2 which is computed as follows, 
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A. Comparison between GMSR and MSR when d\ < k 

Referring to O and (O, GMSR and MSR have an equal 
storage capacity per node. To get an insight regarding the 
repair bandwidth, we define the following repair bandwidth 
ratio, 

a 7GMSR(fc') dxk' + d 2 



PMSR(k') 



7msr 



(21) 



This ratio is always greater than one for k' > 1, meaning 
GMSR code imposes a large bandwidth to the system as 
compared to MSR. Similarly, the download cost ratio is 
defined as, 



msR(k') = 



CT 0MSR (fc') C 1 d 1 k' + C 2 d 2 



Cxdi + C 2 d 2 



(22) 



Again ij(k') for all positive values of k' is greater than one. 
Having larger repair bandwidth and storage capacity as well 
as higher download cost, one can conclude that GMSR does 
not have favorable result as compared to MSR. Thus, GMSR 
does not perform well for the case of d% < k, meaning in this 
case it is better to set /?i = f3 2 (MSR approach). 

B. Comparison between GMBR and MBR when d\ < k 

As the storage per node is equal to the repair bandwidth 
for both MBR and GMBR codes, we concentrate on the 
repair bandwidth. Again, we define the repair bandwidth ratio 
PMBR(k') as follows: 



PMBR(fc') = 



7GMBR(fc') 
7MBR 

(dxk' + d 2 )(2kd - 
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(23) 



p(k') has a positive derivative with respect to k' and noting 
p(l) = 1 it follows p(k') > 1 for k' > 1. Thus, MBR 
outperforms GMBR in terms of having lower repair band- 
width. Similarly, we define the download cost ratio as follows, 



(2kd -k 2 -dj-dx+k+ 2d 1 k') + ik'(2d x 
gx(i) = i(2d-2k + i + l) 
g 2 (i) 4 (i + l) (2^ + ^) . (17) 

Thus, (3 2m m can be computed as, 
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(24) 



Again, to have the download cost of GMBR lower than that 



2kd-k 2 + k + (d\ + dx)(k' - 1) 

(18) of MBR, the following condition should be satisfied, 

Accordingly, GMSR and GMBR, two extremal points of r* ouj u% i u ^2 

trade-off curve, have the following storage capacity-repair 
bandwidth, 



C 2 2kd~k 2 +k-df- dx 
C~x ~ d 2 (dx + 1) 
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In this case, the minimum value of rj is achieved as k' tends 

( 19 ) to infinity ie n( I oo) - ( c ^ 2kd - k2 + k ) 
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Fig. 4. The tradeoff curves between the relative cost and repair bandwidth 
ratio for GMSR code. 



Fig. 5. The tradeoff curves between the relative cost and repair bandwidth 
ratio for GMBR code. 



VI. Numerical Results 

This section aims at providing some numerical results to 
get an insight regarding the proposed GMSR and GMBR 
codes and their advantages in terms of the corresponding 
storage capacity and/or repair bandwidth as compared to the 
MSR and MBR codes. In FigJU p{k') versus r)(k') of the 
GMSR code for different integer values of k' in the interval 
[1, 20] and for different relative cost ratios of ^ is illustrated. 
Moreover, it is assumed (n,k,di,d2) = (15,5,8,6), which 
corresponds to scenario A, since d\ > k. Noting the condition 
(TT2l . in this example, if ^f- > d ^+1 = 2, the download 
cost of GMSR is lower than that of MSR (f](k') < 1). This 
is in accordance to what is inferred from Figj4] Moreover, 
FigS depicts the amount of increment in repair bandwidth 
for a given download cost ratio. Similarly, Fig|5] provides 
the same result for GMBR with the same parameters, i.e., 
(n, k, di, d 2 ) = (15, 5, 8, 6). Again, referring to equation d!51 l. 
rj(k') > 1 for ^ > 2di-l+i = which is in accordance 
to the result of FigH] 

Fig|6] depicts the p(k') versus rj{k') for GMBR when 
(n,k,d 1 ,d 2 ) = (15,5,4,10). Noting di < k, this 
case belongs to scenario B. Referring to (l25T l. if ^ > 
2kd-k -{-k-d^-d! _ ^ ^ downlod cost of GMBR is lower 
than that of MBR (r/(fc') < 1). Figj6] confirms this threshold 
for ^t-. Moreover, it shows how download cost ratio affects 
the repair bandwidth ratio (p(fc')). 

Also, the tradeoff curves between the storage capacity per 
node and repair bandwidth for RC and GRC codes for two 
different values of k' = 2,4 are shown in Fig|7j This shows 
the storage capacity-repair bandwidth tradeoff curve of RC 
code outperforms that of GRC (the dotted curve), while as 
is noted before, GRC may result in lower download cost as 
compared to that of RC code. 

Finally, Fig [8] is provided to show the impact of different 
values of k' on rj and for different values of Fig|8]confirms 
that under certain conditions as is mentioned in the preceding 
sections, rjik') is a decreasing function with respect to k! . 

VII. Conclusion 

This paper aims at addressing the cost bandwidth tradeoff 
in distributed storage systems when the download cost of 
storage nodes are not the same. Specifically, we concentrate 




Fig. 6. The tradeoff curves between the relative cost and bandwidth ratio 
for GMBR code. 




Fig. 7. The tradeoff curves between the storage per node and repair 
bandwidth. 
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Fig. 8. The effect of k' on the relative cost. 
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to case that there are two sets of nodes, each having dif- 
ferent download costs. Accordingly, using the corresponding 
Information Flow Graph, a new variation of regenerating 
codes, called generalized regenerating codes, is proposed and 
is shown under some certain conditions outperform the current 
regenerating codes in terms of having lower download cost, 
while having a marginal increase in the repair bandwidth. 
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VIII. Appendix 

To derive the optimal tradeoff between a and f3 2 , one can 
fix (3 2 and di,d2, k' (to some integer values) and then find the 
minimum value of a such that and ( TToT l are satisfied. To 
this end, we define a m in as follows, 

amin (di,d 2 ,k',p 2 ) =min a 

subject to : C > M , (26) 

where depending to on the condition that d\ > k or d\ < k 
we have, 



fc-i 



C = ^Jrnin{(di/3i + d 2 f3 2 —iPi), &} for d\>k 

i=0 
di 

C = ^min{(difc' + d 2 -ik')f3 2 , a} 



i=0 
fc-1 

+ ^ min{(<ii + d 2 - i)f3 2 , a} for d\ < k (27) 

The result of d\ > k: 
To prove ©, substituting fi\ = k'(3 2 in the corresponding C 
(equation d27| i with d\ > k), it follows, 



k — l 



C = J2 min{(di/3i + d 2 /3 2 - ift), a} 

i=0 
fc-1 

= ^min{(difc' + d 2 -ik')/3 2 , a} > M. (28) 



Thus, C can be computed, assuming a belongs to one of the 
following intervals, 



C(a) = 
( ka 

(k-l)a + h(l)t3 2 



ae [0,h(l)fo] 

ae (h(l)/3 2 ,h(2)(3 2 ] 



(k - j)a + ELi h(i)p 2 a e (h(j)fa, h(j + 1)(3 2 



where 

h(i) =dtk' + d 2 - (k-i)k' 
As a result, noting C > M, it follows, 



G (h(k~l)l3 2 ,h(k)t3 2 ] , 
(29) 



M 
& 



M e [Q,kh(l)f3 2 



M - {Y t^ 2 m e ((* - mm + eu Km 
(k~mj + m + ELiHm] , 

j =0,1,..., fc-1 
or equivalently, 



M 
fe 



i= r M 

c IkhTi 



(TT- 



M-(ELlM'))fe R r 

WD P' 2 e l 



As a result, noting ( |29l , it follows, 



M 
A- 



^ e [/(0),< 



2M ' 9( ' )fe /3 2 G [/(i), /(i - 1)) , i = 0, 1, fc - 1, 



2(fc-i) 



where 



m = 



2M 



(30) 



2feft(0) + (« + l)(2fe-i)fe' 

g(i) = i(2d x fc' + 2d 2 - 2fcfc' + (i + l)fc') (31) 

I3 mm - /(fc - 1) (32) 
The result of d 1 < k: 



i=0 



x 



In this case, substituting 0i = fc'02 in C (Equation d27| i 
when c?i < fc) and following the same approach as the case 
of d\ > k, it follows, 



C(a) 



ka a£[0,/i(l,0)ft] 
(k - l)a + h(l, O)0 2 a G (h(l, O)0 2 , h(2, O)0 2 ] 

(k - j)a + ELi Hh 0)02 a G (h(j, O)0 2l h(j + 1, O)0 2 ] 

+ Ei=i dl 0)02 a G (h(k - d x , O)0 2 , 

h(k-di,l)fa] 

A a£ (h(k - di, 1)02, 

/i(fc-di,2)0 2 ] 

B ae(h(k-di,t)P 2 , 

h(k-di,t + l)p 2 ] 

D a G {h(k-d ll d 1 - 1)0 2 , 

where 

/i(x, y) = di + c?2 — + a; + yfc' 

A = (di - l)a+ ^ O)0 2 + h(k-d 1: 1)0 2 

i=l 

k — di t 

B = (dx-t)a+ J2 h(i,0)P 2 + ^h{k-d 1 ,i)p 2 

i=l i=l 
k — di d\ — 1 

D = «+ ^ /i(i,Q)0 2 + J] h(k-d!,i)p 2 

i=l i=l 

C(a min ) = M . (33) 
Thus, a m i n = C _1 (M) can be computed as, 

Q!min(dl,d2, fe', 02) = 

' f 2 G [/x(0),Oo) 

Ae^Mi-i)) 
[ 2M ' (gl(fc 2 ^::-) )+32(t)) ' 2 Ae[/2(i) s /2(i-l)), 

where 

, n a ?M 

/lU 2fc(d-fc) + (i + l)(2A:-i)) 
fn a 2M 



(2fcd -k 2 -d\-d l + k + 2d l k') + jfe'(2di - i - 1) 
ffx(i) = i(2d-2fc + i + l) 

g 2 {i) 4 (i + l)(2d 2 + ifc') . (34) 
Finally, 02min can be computed as, 

2mi „ = / 2 (di - 1) (35) 



